{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"img/train_model.png\" width=\"90%\" align=\"left\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Choose the Right Algorithm!\n",
    "\n",
    "Below is the Scikit-Learn Cheat Sheet\n",
    "\n",
    "![Scikit-Learn Cheat Sheet](https://scikit-learn.org/stable/_static/ml_map.png)\n",
    "\n",
    "https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Types of Machine Learning Problems\n",
    "\n",
    "**Use Cases**\n",
    "\n",
    "* _Classification_. The goal in classification is to take input values and organize them into two or more categories. An example classification use case is fraud detection. In fraud detection, the goal is to take information about the transaction and use it to determine if the transaction is either fraudulent or not fraudulent. When XGBoost is given a dataset of past transactions and whether or not they were fraudulent, it can learn a function that maps input transaction data to the probability that transaction was fraudulent.\n",
    "* _Regression_. In regression, instead of mapping inputs to a discrete number of classes, the goal is output a number. An example regression problem is predicting the price that a house will sell for. When a regression algorithm is given historical data about houses and selling prices, it can learn a function that predicts the selling price of a house given the corresponding metadata about the house.\n",
    "* _Ranking_. Suppose you are given a query and a set of documents. In ranking, the goal is to find the relative importance of the documents and order them based on relevance. An example use case of ranking is a product search for an ecommerce website. You could leverage data about search results, clicks, and successful purchases, and then apply XGBoost for training. This produces a model that gives relevance scores for the searched products.\n",
    "\n",
    "**Features of Amazon SageMaker Built-In Algorithms**\n",
    "\n",
    "* _Out-of-the-box distributed training_.  Amazon SageMaker Built-Ins allow customers to train massive data sets on multiple machines. Just specify the number and size of machines on which you want to scale out, and Amazon SageMaker will take care of distributing the data and training process.\n",
    "* _Sharded by Amazon S3 key training_. Sharded by Amazon S3 key training requires you to partition your data on Amazon S3. This allows Amazon SageMaker to download each partition of the dataset to individual nodes rather than downloading all the data on all nodes. This saves time in downloading the dataset from Amazon S3 and ultimately speeds up training jobs.\n",
    "* _Spark integration with the Spark SDK_. The SageMaker Spark SDK provides a concise API for developers to interact with Amazon SageMaker XGBoost. Developers can first preprocess data on Apache Spark, then call a SageMaker Built-In Algorithm directly from their Spark environment. This will spins up Amazon SageMaker training instances and uses them to train models on the data that was already preprocessed with Spark.\n",
    "* _Easy deployment and managed model hosting_. After a model is trained, you need only one API call to deploy it to production. The Amazon SageMaker hosting environment is managed, and it can be configured for auto scaling, which reduces the operational overhead of running a hosting environment.\n",
    "* _Native A/B Testing_. Using Amazon SageMaker hosting, you can run multiple models, each with different weights for inference. The A/B testing helps customers determine the best models for their use case.\n",
    "\n",
    "**Custom Code in SageMaker (aka \"Script Mode\")**\n",
    "You can run your own scripts inside SageMaker to inherit the benefits of the managed SageMaker infrstructure.  Scripts are provided with specific environment variables including NUM_GPUS (multi-gpu instances), NUM_HOSTS (distributed training), etc.  Here is the full list:  https://github.com/aws/sagemaker-containers#list-of-provided-environment-variables-by-sagemaker-containers\n",
    "\n",
    "All major AI/ML frameworks are supported by SageMaker including the following (with links to their open source Dockerfiles):\n",
    "* [TensorFlow/Keras](https://github.com/aws/sagemaker-tensorflow-container/tree/script-mode)\n",
    "* [PyTorch](https://github.com/aws/sagemaker-pytorch-container)\n",
    "* [MXNet](https://github.com/aws/sagemaker-mxnet-container)\n",
    "* [Chainer](https://github.com/aws/sagemaker-chainer-container)\n",
    "* [Scikit-Learn](https://github.com/aws/sagemaker-scikit-learn-container)\n",
    "* [XGBoost](https://github.com/aws/sagemaker-xgboost-container)\n",
    "* [Spark ML](https://github.com/aws/sagemaker-sparkml-serving-container)\n",
    "* [Reinforcement Learning](https://github.com/aws/sagemaker-rl-container)\n",
    "\n",
    "Users can provide their own `requirements.txt` to define custom Python libraries.\n",
    "\n",
    "**Custom Containers in SageMaker (aka Bring Your Own Container)**\n",
    "You can use your own container, as well.  Simply provide a Docker image that contains your model and dependencies - and SageMaker will do the rest!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Release Resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%html\n",
    "\n",
    "<p><b>Shutting down your kernel for this notebook to release resources.</b></p>\n",
    "<button class=\"sm-command-button\" data-commandlinker-command=\"kernelmenu:shutdown\" style=\"display:none;\">Shutdown Kernel</button>\n",
    "        \n",
    "<script>\n",
    "try {\n",
    "    els = document.getElementsByClassName(\"sm-command-button\");\n",
    "    els[0].click();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}    \n",
    "</script>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%javascript\n",
    "\n",
    "try {\n",
    "    Jupyter.notebook.save_checkpoint();\n",
    "    Jupyter.notebook.session.delete();\n",
    "}\n",
    "catch(err) {\n",
    "    // NoOp\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
